b'TREASURY INSPECTOR GENERAL FOR TAX ADMINISTRATION\n\n\n\n\n                       The Customer Account Data Engine 2\n                    Database Was Initialized; However, Database\n                       and Security Risks Remain, and Initial\n                        Timeframes to Provide Data to Three\n                       Downstream Systems May Not Be Met\n\n\n\n                                          September 27, 2012\n\n                                  Reference Number: 2012-20-109\n\n\n\n\n     This report has cleared the Treasury Inspector General for Tax Administration disclosure review process\n      and information determined to be restricted from public release has been redacted from this document\n .\n\n\n Phone Number | 202-622-6500\n E-mail Address | TIGTACommunications@tigta.treas.gov\n Website        | http://www.tigta.gov\n\x0c                                                 HIGHLIGHTS\n\n\nTHE CUSTOMER ACCOUNT DATA                           WHAT TIGTA FOUND\nENGINE 2 DATABASE WAS INITIALIZED;\n                                                    Our review determined that data integrity testing\nHOWEVER, DATABASE AND SECURITY                      did not provide assurance that CADE 2\nRISKS REMAIN, AND INITIAL                           database data are consistently accurate and\nTIMEFRAMES TO PROVIDE DATA TO                       complete. Also, the CADE 2 database design\nTHREE DOWNSTREAM SYSTEMS MAY                        has not fully met initialization, daily update, and\nNOT BE MET                                          downstream interface needs.\n                                                    To address the issues identified during testing,\nHighlights                                          the IRS developed version 2.2 of the CADE 2\n                                                    database. The IRS spent up to $22.3 million on\n                                                    database implementation including developing\nFinal Report issued on\n                                                    version 2.2 of the CADE 2 database from\nSeptember 27, 2012                                  January through July 2012. The IRS does not\n                                                    track cost at the development activity level;\nHighlights of Reference Number: 2012-20-109         therefore, TIGTA could not determine the actual\nto the Internal Revenue Service Chief               cost for version 2.2 of the CADE 2 database.\nTechnology Officer.\n                                                    Enhanced security is one of the goals of the\nIMPACT ON TAXPAYERS                                 CADE 2 Program. CADE 2 database security\nThe overall goals for the Customer Account          will be implemented via a role-based access\nData Engine 2 (CADE 2) Program are to               model and the Resource Access Control Facility.\nprocess individual taxpayer account data in a       However, vulnerabilities in the JAVA code could\nmodernized environment and provide more             result in loss of sensitive taxpayer information,\ntimely and accurate data to front-line employees.   and remediation of identified security\nA transactional database capable of supporting      weaknesses is ineffective.\nboth tax processing and enterprise-wide data        WHAT TIGTA RECOMMENDED\naccess is a cornerstone of that effort. In\nTransition State 1, the IRS will establish the      TIGTA recommended that the Chief Technology\ndatabase and processes will be developed to         Officer: 1) ensure the CADE 2 Program does\nkeep the database current with daily account        not exit Transition State 1 until the CADE 2\ninformation from the Individual Master File. The    database can provide accurate and complete\ndatabase will be able to provide daily updates to   data to the three downstream systems;\nthe IRS\xe2\x80\x99s key customer service database, the        2) ensure the database design process follows\nIntegrated Data Retrieval System, and it will be    the Internal Revenue Manual and validate that\nable to populate the key compliance analytical      the database design meets business\ndatabase, the Integrated Production Model, with     requirements; 3) realign data validation and\nmore timely data. Incomplete, inaccurate, and       testing efforts with business functionality and\nunsecured data on the CADE 2 database will          processes; 4) ensure JAVA code weaknesses\nprevent the IRS from providing quality customer     are remediated; 5) ensure privileged accounts\nservice and could compromise taxpayer data.         are documented, administered, monitored, and\n                                                    reviewed in accordance with the Internal\nWHY TIGTA DID THE AUDIT                             Revenue Manual or removed from the system;\nThe overall objective was to review the CADE 2      6) ensure sample tables and default ports are\ndatabase implementation and ensure that the         disabled or removed; and 7) enhance the\ndatabase was secure, accurate, and complete,        Online 5081 system.\nand that prior weaknesses identified were           The IRS agreed with three and partially agreed\ncorrected or mitigated. This review addresses       with one of the seven recommendations and\nthe major management challenge of                   corrective actions are planned. The IRS\nModernization.                                      disagreed with three recommendations and\n                                                    TIGTA provided comments in the audit report.\n\x0c                                           DEPARTMENT OF THE TREASURY\n                                                WASHINGTON, D.C. 20220\n\n\n\n\nTREASURY INSPECTOR GENERAL\n  FOR TAX ADMINISTRATION\n\n\n\n\n                                         September 27, 2012\n\n\n MEMORANDUM FOR CHIEF TECHNOLOGY OFFICER\n\n\n FROM:                       Michael E. McKenney\n                             Acting Deputy Inspector General for Audit\n\n SUBJECT:                    Final Audit Report \xe2\x80\x93 The Customer Account Data Engine 2 Database\n                             Was Initialized; However, Database and Security Risks Remain, and\n                             Initial Timeframes to Provide Data to Three Downstream Systems May\n                             Not Be Met (Audit # 201220023)\n\n This report presents the results of our review of the Customer Account Data Engine 2 Database\n Implementation to ensure that the database was secure, accurate, and complete, and that prior\n weaknesses identified were corrected or mitigated. This review addresses the major management\n challenge of Modernization.\n Management\xe2\x80\x99s complete response to the draft report is included in Appendix IV.\n Copies of this report are also being sent to the Internal Revenue Service managers affected by the\n report recommendations. Please contact me at (202) 622-6510 if you have questions or\n Alan R. Duncan, Assistant Inspector General for Audit (Security and Information Technology\n Services), at (202) 622-5894.\n\x0c                      The Customer Account Data Engine 2 Database Was Initialized;\n                        However, Database and Security Risks Remain, and Initial\n                       Timeframes to Provide Data to Three Downstream Systems\n                                           May Not Be Met\n\n\n\n\n                                             Table of Contents\n\nBackground .......................................................................................................... Page 1\n\nResults of Review ............................................................................................... Page 3\n          Design Issues Are Jeopardizing the Ability of the Customer\n          Account Data Engine 2 Database to Serve As a Trusted\n          Source of Data............................................................................................... Page 3\n                    Recommendation 1:........................................................ Page 7\n\n                    Recommendation 2:........................................................ Page 8\n\n                    Recommendation 3:........................................................ Page 9\n\n          Security Weaknesses and Poor Coding Practices in the Customer\n          Account Data Engine 2 Database Could Result in the Loss of\n          Taxpayer Data ............................................................................................... Page 10\n                    Recommendation 4:........................................................ Page 13\n\n                    Recommendations 5 through 7:......................................... Page 14\n\n\nAppendices\n          Appendix I \xe2\x80\x93 Detailed Objective, Scope, and Methodology ........................ Page 16\n          Appendix II \xe2\x80\x93 Major Contributors to This Report ........................................ Page 18\n          Appendix III \xe2\x80\x93 Report Distribution List ....................................................... Page 19\n          Appendix IV \xe2\x80\x93 Management\xe2\x80\x99s Response to the Draft Report ...................... Page 20\n\x0c         The Customer Account Data Engine 2 Database Was Initialized;\n           However, Database and Security Risks Remain, and Initial\n          Timeframes to Provide Data to Three Downstream Systems\n                              May Not Be Met\n\n\n\n\n                         Abbreviations\n\nCADE 2            Customer Account Data Engine 2\nETL               Extract, Transform, and Load\nIDRS              Integrated Data Retrieval System\nIMF               Individual Master File\nIRM               Internal Revenue Manual\nIRS               Internal Revenue Service\nTIF               Taxpayer Information File\nTS1               Transition State 1\n\x0c                  The Customer Account Data Engine 2 Database Was Initialized;\n                    However, Database and Security Risks Remain, and Initial\n                   Timeframes to Provide Data to Three Downstream Systems\n                                       May Not Be Met\n\n\n\n\n                                        Background\n\nThe Customer Account Data Engine 2 (CADE 2) Program is the top information technology\nmodernization project in the Internal Revenue Service (IRS). The CADE 2 mission is to provide\nstate-of-the-art individual taxpayer account processing and data-centric technologies to improve\nservice to taxpayers and enhance IRS tax administration. CADE 2 will replace the current\nIndividual Master File (IMF) account settlement system with a modernized, relational database\nprocessing system and become a key component in the IRS\xe2\x80\x99s enterprise-wide, data-centric\ninformation technology strategy. Figure 1 provides the CADE 2 system implementation phases.\n                    Figure 1: CADE 2 System Implementation Phases\n\n      Phase                                             Description\n   Transition        The IRS will establish a single database that will store all individual\n  State 1 (TS1)      taxpayer accounts. Processing will be enhanced to include daily batch\n                     processing. The key IRS customer service operational database, the\n                     Integrated Data Retrieval System (IDRS), will have the benefit of more\n                     timely posted data. The solution will populate the Integrated Production\n                     Model analytical data store and provide business users with tools to more\n                     effectively use the data for compliance and customer service. Enhanced\n                     data security will be in place. Downstream systems that must be modified\n                     to support daily processing are included in the scope of the TS1.\n    Transition       A single processing system will be implemented. Applications will directly\n     State 2         access and update the taxpayer account database, and continued efforts will\n                     be made in addressing existing financial material weaknesses. The IRS\n                     planned to implement Transition State 2 in January 2014. This date is no\n                     longer viable, due to funding delays, but a new date has not been\n                     determined.\n   Target State      Implement a single system in which all transitional applications are\n                     eliminated. The complete solution is also planned to address all the\n                     financial material weaknesses. As of April 3, 2012, the IRS had not\n                     established a Target State implementation date.\nSource: The CADE 2 Program Charter and meetings with the CADE 2 executives.\n\nTS1 will move the IRS away from operating in two tax processing environments \xe2\x80\x93 the IMF and\nCurrent CADE \xe2\x80\x93 towards a single system for managing individual taxpayer accounts. It has two\nmajor implementation pieces: Daily Processing and Database Implementation. Daily\nProcessing, which uses IMF files and not the CADE 2 database, went into production in\n\n                                                                                            Page 1\n\x0c                The Customer Account Data Engine 2 Database Was Initialized;\n                  However, Database and Security Risks Remain, and Initial\n                 Timeframes to Provide Data to Three Downstream Systems\n                                     May Not Be Met\n\n\nJanuary 2012. IRS management stated that this has resulted in providing faster refunds for\nmillions of taxpayers and that posted information was viewable on the IDRS within 48 hours of\nprocessing.\nDatabase Implementation, which is the subject of this audit, is in the final testing stage for\nversion 2.2, which is expected to be placed into production in late 2012. Version 2.1 of the\ndatabase was initialized earlier in Calendar Year 2012. IRS management stated that this earlier\nversion of the database successfully initialized 270 million individual taxpayer accounts and\nmore than a billion tax modules while balancing to the penny.\nWithin TS1, the primary deliverable of the CADE 2 Database Implementation project is a\nrelational database that will store individual taxpayer account data, currently being processed by\nthe IMF. This database will serve as the trusted source of data for three critical downstream\nsystems: Corporate File On-Line/Individual Master File On-Line, IDRS Taxpayer Information\nFile (TIF), and the Integrated Production Model. In Transition State 2, the CADE 2 database\nwill become the sole source of IMF data and become the system of record for individual tax\naccount processing, as the IMF entity and tax module files will be retired.\nThis audit reviewed the steps taken by the IRS to prepare for the CADE 2 Database\nImplementation and examined the process for addressing weaknesses and issues within TS1.\nThis review was performed at the IRS Information Technology headquarters office in\nNew Carrollton, Maryland, during the period February through June 2012. We conducted this\nperformance audit in accordance with generally accepted government auditing standards. Those\nstandards require that we plan and perform the audit to obtain sufficient, appropriate evidence to\nprovide a reasonable basis for our findings and conclusions based on our audit objective. We\nbelieve that the evidence obtained provides a reasonable basis for our findings and conclusions\nbased on our audit objective. Detailed information on our audit objective, scope, and\nmethodology is presented in Appendix I. Major contributors to the report are listed in\nAppendix II.\n\n\n\n\n                                                                                            Page 2\n\x0c                  The Customer Account Data Engine 2 Database Was Initialized;\n                    However, Database and Security Risks Remain, and Initial\n                   Timeframes to Provide Data to Three Downstream Systems\n                                       May Not Be Met\n\n\n\n\n                                   Results of Review\n\nDesign Issues Are Jeopardizing the Ability of the Customer Account\nData Engine 2 Database to Serve As a Trusted Source of Data\nThe CADE 2 database is the cornerstone for all CADE 2 system development. One of the\nprimary goals of the CADE 2 database is for it to be a trusted source of data. To provide this, it\nneeds a stable design built to support tax processing functions and the assurance of complete and\naccurate data. Without these, the CADE 2 Program will not be successful. In TS1, the CADE 2\ndatabase will be initialized with IMF entity and tax module data, updated on a daily basis to keep\nit synchronized with the IMF files, and serve as a trusted source of data for selected downstream\nsystems. The database must, therefore, provide sufficient evidence that its data are accurate and\ncomplete, and that it will provide the design needed for continued data reliability. This will be\ncritical when CADE 2 transitions into a transactional database processing system and the system\nof record for all individual taxpayer accounts.\nOur review of the CADE 2 Database Implementation project determined that weaknesses were\nfound in the data validation process and the database design was not fully validated against\nbusiness needs.\n\nData integrity testing did not provide assurance that CADE 2 data are consistently\naccurate and complete\nThe IRS cannot ensure the data on the CADE 2 database are consistently accurate and complete\ndespite current control procedures and data integrity testing efforts. The Internal Revenue\nManual (IRM) defines data controls as activities or tasks employed to preserve the accuracy of\ndata by either deleting, detecting, or preventing operator errors, and by providing assurances that\ndata are not lost, added, or inadvertently changed.1 The IRM also provides that testing be\nconducted to ensure system components are free of logic and design errors, and customer\nrequirements are satisfied.2 In addition, the IRM requires that business requirements and\nbusiness functions be fully documented during the business analysis phase.3\nThe IRS has data integrity checks in place at several levels of the CADE 2 database: data field\nlevel, record level, account and file level, and Master File level. Figure 2 summarizes the data\nintegrity checks performed at each of these levels.\n\n\n1\n  IRM 2.5.3, Systems Development - Programing Techniques and Source Code Standards.\n2\n  IRM 2.6.1, Product Assurance - Test, Assurance and Documentation.\n3\n  IRM 2.5.13, Systems Development - Database Design Techniques and Deliverables.\n                                                                                            Page 3\n\x0c                   The Customer Account Data Engine 2 Database Was Initialized;\n                     However, Database and Security Risks Remain, and Initial\n                    Timeframes to Provide Data to Three Downstream Systems\n                                        May Not Be Met\n\n\n        Figure 2: Types of Data Validation Performed on the CADE 2 Database\n\n Control Level                     Approach                                      Description\n Data Field          Data Integrity Validation                 Validates data values between the\n Level               Approach                                  CADE 2 database and the IMF. Uses a\n                                                               combination of manual and systemic data\n                                                               compares and data transformation rule\n                                                               validation.\n Record Level        Database Referential Integrity            Ensures that every record inserted into a\n                     Checks                                    table has a valid relationship to an existing\n                                                               account on the database.\n Account and         Balance and Control Procedures            Checks counts and amounts between the\n File Level                                                    IMF source files and the CADE 2\n                                                               database. Record counts and module\n                                                               balance amounts are checked through the\n                                                               use of control records during the Extract,\n                                                               Transform, and Load (ETL) process.\n Master File         Database Implementation                   Balances to the IMF Recap Report.\n Level               Simplified Financial Report\nSource: Treasury Inspector General for Tax Administration analysis of IRS documents.\n\n During the first initialization of the CADE 2 database using version 2.1.1 of the data model,\n documentation showed that data validation efforts were adequate at the record level, account and\n file level, and Master File level. Referential integrity was maintained within the database, file\n control records were balanced to the CADE 2 database counts and amounts, and total financial\n assessments, credits, and debits balanced to the IMF Recap Report. However, at the data field\n level, the Data Integrity Validation Approach did not provide assurance that all the data values\n loaded into the CADE 2 database were accurate and complete. This was due to the complexity\n of many of the data transformation rules and embedded business logic contained within IMF data\n fields.\n Manual and systemic data comparisons, when combined, validated approximately 70 percent of\n the data columns on the CADE 2 database against their IMF source values. Systems\n Acceptability Testing tested the remaining 30 percent through data transformation rule tests. The\n IRS acknowledged that these tests could not ensure the accuracy of the remaining 30 percent\n because these tests were limited and did not cover all variations or conditions of transformation\n logic.\n Further, in a May 2012 meeting, IRS management acknowledged that not all variations or\n conditions in these complex data transformations had been identified. The IRS CADE 2 Full\n\n                                                                                                     Page 4\n\x0c                The Customer Account Data Engine 2 Database Was Initialized;\n                  However, Database and Security Risks Remain, and Initial\n                 Timeframes to Provide Data to Three Downstream Systems\n                                     May Not Be Met\n\n\nData Coverage Mapping document mapped data elements to subject areas within the CADE 2\ndatabase, but it did not map data elements to core IMF business functions or processes. Thus,\nthe IRS could not identify data elements that supported some business functions or determine\nwhere business logic may have been embedded in IMF data elements. As a result, the IRS is\nencountering unanticipated data values and hidden business logic during the database load and\nupdate process. Without a complete list of all data elements, values, and business processes, the\nIRS could not design an adequate strategy for data validation at the data field level.\nThe CADE 2 database will contain data elements from the IMF entity and tax module files.\nHowever, additional data elements necessary for the IDRS TIF will be loaded into the CADE 2\ndatabase during the daily update process. This TIF data will include notice data, which will be\nused to address existing financial material weaknesses. To validate this data, the IRS intends to\nuse an IDRS TIF comparison tool to compare the data extracted from the CADE 2 database to\nthe data currently being sent to the IDRS TIF by the IMF. Development of the tool was not\ncompleted as of May 2012; therefore, we could not verify the effectiveness of this planned data\nvalidation effort.\nAlthough the IRS designed a fairly comprehensive strategy to check the data integrity of the\nCADE 2 database, it did not conduct a proper Business Analysis to align IMF data elements with\nbusiness processes and business requirements before attempting to initialize and update the\nCADE 2 database. It is therefore impossible for the IRS to verify that the data transformation\nrules used to load and update the database are complete and that all embedded business logic and\nsystem conditions contained in IMF data fields are accounted for and tested. Without a\ndocumented inventory of business processes and their supporting data elements, the IRS cannot\nverify the accuracy and completeness of the CADE 2 database. The database should not be used\nas a trusted source of data until a method to validate the accuracy of the data is developed.\n\nThe CADE 2 database design has not fully met initialization, daily update, and\ndownstream interface needs\nA logical data model defines the structures of the data for a database. The logical data model is\ndesigned from data requirements that support a set of business processes derived from business\nrequirements. The CADE 2 database was designed using the IMF\xe2\x80\x99s DB2 database, the Current\nCADE database, and the IMF\xe2\x80\x99s core record layouts. In TS1, the goal was to initialize and update\nthe CADE 2 database on a daily basis with data from the IMF system. The CADE 2 data\nmigration was performed through the ETL process, during which data were extracted from the\nsource IMF system, transformed to fit into the destination CADE 2 database, and loaded into the\nCADE 2 database. The ETL process used rules and functions to transform the IMF source data\ninto the data loaded into the CADE 2 database. Transformation rules and functions should be\ndeveloped from an analysis of business functions, business processes, and business requirements.\nDuring initialization of the CADE 2 database in January 2012, the IRS discovered that the\nTaxpayer Delinquent Account data field contained embedded business logic and was being used\n\n                                                                                           Page 5\n\x0c                      The Customer Account Data Engine 2 Database Was Initialized;\n                        However, Database and Security Risks Remain, and Initial\n                       Timeframes to Provide Data to Three Downstream Systems\n                                           May Not Be Met\n\n\nfor more than one purpose and for more than one business process. For years, the IRS policy has\nallowed changes to the IMF data structures only once a year; therefore, the IMF developers\nwould use existing bits and/or bytes of the IMF data structures in order to support new business\nrequirements. This practice of using embedded business logic was not always documented, and\nin this instance led to programming issues during the ETL process. The programming issue\nforced the IRS to redesign the database model. The IRS had recorded the Taxpayer Delinquent\nAccount data field issue in October 2011. However, the database initialization phase proceeded\nwith version 2.1.1 of the data model without any remedy for the issue. Proceeding with the\ndatabase initialization was not in accordance with an independent contractor\xe2\x80\x99s recommendation\nthat stated: \xe2\x80\x9cDatabase implementation teams do not compromise quality for the sake of hitting\nthe schedule as this is likely to result in more painful re-work in the future.\xe2\x80\x9d\nThe daily update testing, performed by the IRS in February 2012, revealed two other business\nrequirements that were not accounted for in the database design. The first was processing where\nthe IRS overlays data in an original transaction when that transaction is reversed. To\naccommodate the recording of the reversal, the IRS had to create a history table on the CADE 2\ndatabase. The second missed requirement dealt with taxpayer account merges. The database had\nto accommodate the situation where the Social Security Number of the taxpayer account\nchanged, whether it was due to identity theft or other circumstances. The unique key4 used on\nthe IMF sequential files could not be used on the CADE 2 relational database because a piece of\nthe unique key had changed. This impacted database indexes and referential integrity checks.\nThe IRS had to redesign the database with a new unique key that would not be impacted by\naccount merges.\nDuring a Program Management Office meeting in June 2012, the IRS acknowledged that it was\nhaving problems with its CADE 2 database interface to the IDRS TIF. The CADE 2 program\xe2\x80\x99s\narchitecture solution planned to re-use the existing IMF to IDRS TIF interface for the CADE 2\ndatabase interface to IDRS TIF. However, the data types being extracted and sent from the\nCADE 2 database were not what the IDRS TIF system was expecting. Zeroes, blanks, and null\nvalues were not being transformed correctly; therefore, the IDRS TIF could not process the\nincoming CADE 2 database data successfully. As a result, the IRS is re-evaluating its data\nstrategy for feeding downstream systems and is considering delaying the IDRS TIF interface.\nFigure 3 presents a partial CADE 2 database development timeline.\n\n\n\n\n4\n    The IMF unique key is a combination of the Taxpayer Identification Number, the type of tax, and the tax year.\n                                                                                                             Page 6\n\x0c                  The Customer Account Data Engine 2 Database Was Initialized;\n                    However, Database and Security Risks Remain, and Initial\n                   Timeframes to Provide Data to Three Downstream Systems\n                                       May Not Be Met\n\n\n                Figure 3: Partial CADE 2 Database Development Timeline\n\n\n\n\nSource: The CADE 2 Integrated Master Schedule dated May 30, 2012, other IRS documentation, and interviews of\nIRS personnel.\n\nTo address the issues identified during testing, the IRS developed version 2.2 of the CADE 2\ndatabase. The IRS spent up to $22.3 million on database implementation including developing\nversion 2.2 of the CADE 2 database from January through July 2012. The IRS does not track\ncost at the development activity level. Therefore, we could not determine the actual cost for\nversion 2.2 of the CADE 2 database. These costs could have been avoided by properly\nidentifying the business requirements up front and including these requirements in the original\ndesign.\n\nRecommendations\nThe Chief Technology Officer should:\nRecommendation 1:\xc2\xa0\xc2\xa0Ensure that the CADE 2 Program does not exit TS1 until the CADE 2\ndatabase can provide accurate and complete data to the IDRS TIF, Corporate File\nOn-Line/Individual Master File On-Line, and to the re-evaluated Integrated Production Model.\n        Management\xe2\x80\x99s Response: The IRS agreed with the recommendation. With\n        appropriate approvals from the CADE 2 governance committee, the IRS plans to exit\n        Milestone 5 according to schedule in September 2012, in order to deploy planned\n        Corporate Files On-Line/Individual Master File On-Line and Integrated Production\n        Model Reports functionality. The milestone exit will be conditional, however, until such\n        time as the IRS has deployed planned IDRS TIF functionality, which will be done upon\n        completion of the 2013 Filing Season peak.\n\n\n                                                                                                     Page 7\n\x0c               The Customer Account Data Engine 2 Database Was Initialized;\n                 However, Database and Security Risks Remain, and Initial\n                Timeframes to Provide Data to Three Downstream Systems\n                                    May Not Be Met\n\n\nRecommendation 2:\xc2\xa0\xc2\xa0Ensure that the database design process follows the IRM and validate\nthat the database design meets business requirements.\n      Management\xe2\x80\x99s Response: The IRS disagreed with this recommendation. IRS\n      management stated that the database design approach is extremely sound and meets the\n      IRM standard. It leverages the legacy IMF, which has undergone years of refinement and\n      embodies business requirements that are complete and accurate. The database fully\n      supports the business requirements, as the CADE 2 data model was built using historical\n      lessons learned from previous successes in the CADE and production data from the IMF.\n      An added layer of confidence to the IRS approach was gained by running through\n      transformations using real taxpayer data, which further proved out that the data model\n      and database design was very strong. The CADE 2 data model design approach provided\n      an in-depth understanding of the current nuances of the IRS\xe2\x80\x99s taxpayer data and allowed\n      the IRS to easily introduce new fields to address the financial material weakness. In\n      total, only seven material change requests to the data model have been approved since it\n      was built three years ago. One of these included the change to upgrade to data model\n      version 2.2, which was framed as a \xe2\x80\x9credesign\xe2\x80\x9d in the Treasury Inspector General for Tax\n      Administration\xe2\x80\x99s audit report. In fact, the IRS resolved the issue with a minor\n      modification to the data model, which is another clear indicator that the CADE 2 data\n      model is stable.\n      Office of Audit Comment: The IRM 2.5.13 standard requires that the database design\n      process deliver a set of documents:\n          a) Decision Analysis and Description Forms.\n          b) Task Analysis and Description Forms.\n          c) Task/Data Element Usage Matrix.\n          d) Data Models.\n          e) Entity-Attribute Lists.\n          f) Data Definition Lists.\n          g) Physical Database Specification Document.\n      This set of documents validates that the database design supports the business\n      requirements. We did not receive the necessary documents to confirm that the database\n      design supports the business requirements. Further, the Executive Status Update on\n      September 12, 2012, stated there is a delay in clearing the backlog of defects identified\n      during the data validation activities.\n      \xc2\xa0\n\n\n\n                                                                                          Page 8\n\x0c                The Customer Account Data Engine 2 Database Was Initialized;\n                  However, Database and Security Risks Remain, and Initial\n                 Timeframes to Provide Data to Three Downstream Systems\n                                     May Not Be Met\n\n\nRecommendation 3:\xc2\xa0\xc2\xa0Realign data validation and testing efforts with business functionality\nand processes.\n       Management\xe2\x80\x99s Response: The IRS disagreed with this recommendation. The IRS\n       stated that full data coverage mapping and its data validation and testing approach\n       leveraged business requirements that are implicit in the IMF and have proven the test of\n       time. Additional mapping exercises at the data element level as recommended would add\n       little value to the process. As part of the validation approach, the IRS does mock testing\n       in production simulation environments. It does testing using production data \xe2\x80\x93 a\n       sampling of 2.5 million returns \xe2\x80\x93 to ensure integrity of data. A considerable portion of\n       the integrity testing, for example, has been designed to ensure that outputs from the\n       CADE 2 database \xe2\x80\x93 through Individual Master File On-Line or Taxpayer Identification\n       File outputs (future) \xe2\x80\x93 either match the parallel output from the legacy Master File or fall\n       into a small set of \xe2\x80\x9cacceptable\xe2\x80\x9d differences. With the business organization fully engaged\n       throughout all phases of this data integrity testing and review, the comprehensive\n       top-down and bottom-up approach for data verification has been extremely effective in\n       discovering issues, which are certainly to be expected on projects the size and magnitude\n       of the CADE 2 system. As correctly described in the report, the functional (Systems\n       Acceptability Testing) tests are also verifying whether the transformation rules were\n       implemented per specification. While it is not reasonable to think that any validation\n       approach will cover every possible combination and permutation of those rules, it\n       provides reasonable risk mitigation to complement the IRS\xe2\x80\x99s high-volume data validation\n       testing.\n       Office of Audit Comment: We agree that business requirements are implicit in IMF\n       data. However, we found no evidence of these requirements being traced to either data\n       elements on the CADE 2 database or the transformation rules used to load IMF data into\n       the CADE 2 database. Without verifiable evidence, data elements and values may have\n       been missed. The IRS\xe2\x80\x99s data integrity plan noted that the more complex data\n       transformation rules were tested by Systems Acceptability Testing and that, for many of\n       these fields, Systems Acceptability Testing could not cover all the permutations or\n       conditions of the transformation logic. Therefore, there is no way of knowing which data\n       fields or values may have been missed or left untested, or what business requirements\n       they were needed to support. Only an alignment of data elements to business\n       functionality and a testing effort based on that alignment would ensure the degree of data\n       validation necessary for the CADE 2 database to become the IMF\xe2\x80\x99s authoritative file of\n       record.\n\n\n\n\n                                                                                            Page 9\n\x0c                   The Customer Account Data Engine 2 Database Was Initialized;\n                     However, Database and Security Risks Remain, and Initial\n                    Timeframes to Provide Data to Three Downstream Systems\n                                        May Not Be Met\n\n\nSecurity Weaknesses and Poor Coding Practices in the Customer\nAccount Data Engine 2 Database Could Result in the Loss of\nTaxpayer Data\nEnhanced security is one of the goals of the CADE 2 Program. CADE 2 database security will\nbe implemented via a role-based access model and the Resource Access Control Facility.5\nSecurity will remain a key concern until role-based access is developed and fully implemented\nacross the IRS.\n\nVulnerabilities in the JAVA code could result in loss of sensitive taxpayer\ninformation\nIn designing systems, the IRS has several security requirements from multiple sources that need\nto be met. The National Institute of Standards and Technology publishes the Federal\nInformation Processing Standards that provide the requirements for encryption to be used by\ngovernmental systems to prevent anyone without the necessary credentials from being able to\nascertain the data stored on computer systems.\nThe IRS and the Department of the Treasury also have established standards for systems\noperating on their networks. For example, one IRS policy requires that passwords must be\nchanged after a set number of days and that the password must exceed a specific number of\ncharacters and include certain types of characters. Another requirement is that test code and\nexample database tables and components must be removed from an application.\nIn October 2011, two independent contractors conducted source code security reviews of the\nbalance and control module. Figure 4 provides the JAVA code weaknesses identified by the\nsource code security reviews.\n\n\n\n\n5\n An IBM security system that provides access control and auditing functionality for the z/OS and z/VM operating\nsystems.\n                                                                                                        Page 10\n\x0c                The Customer Account Data Engine 2 Database Was Initialized;\n                  However, Database and Security Risks Remain, and Initial\n                 Timeframes to Provide Data to Three Downstream Systems\n                                     May Not Be Met\n\n\n                             Figure 4: JAVA Code Weaknesses\n\n         High Risk                     Moderate Risk                           Low Risk\n   \xef\x82\xb7   SQL Injection         \xef\x82\xb7   Bug: Incorrect Logical              \xef\x82\xb7   Dead Code\n                                 Operator                            \xef\x82\xb7   Detailed Error\n                             \xef\x82\xb7   Insecure Algorithm                      Messages\n                             \xef\x82\xb7   Insufficient Input Validation       \xef\x82\xb7   Improper Logging\n                             \xef\x82\xb7   Insufficient Password               \xef\x82\xb7   Information Exposure\n                                 Management                          \xef\x82\xb7   Test Code\n                             \xef\x82\xb7   Use of Inner Classes                \xef\x82\xb7   Unreleased Resources\n  Source: Contractor Source Code Review, IRS Wage and Investment Business Unit\xe2\x80\x99s CADE 2 Database\n  Implementation TS-1 JAVA Code, Balance and Control Module Core Module.\n\nBoth contractors recommended that the weaknesses identified in Figure 4 be corrected. The\nCADE 2 Governance Board deemed the overall risk to the database application as low stating the\ncode is hard to exploit and that it will be removed after the second database initialization. The\nGovernance Board accepted the security weaknesses contained in the JAVA initialization code\nand will not take any remediation actions. IRS management also advised that the JAVA code\nwas developed for one-time use.\nHowever, the IRS has used this JAVA code multiple times in testing. The JAVA code was also\nused to initialize the production database in March 2012 with data model version 2.1.1 and it will\nbe used to initialize the database with data model version 2.2 in the summer of 2012. Based on\nthe utilization of this JAVA code, it does not appear to have been developed for one-time use.\nRemediation of the weaknesses will enhance the JAVA database initialization balancing and\ncontrol code and enhance the security of the database. Ineffective password, incorrect logical\noperator statement, dead code, and test code could result in the loss of Personally Identifiable\nInformation data, loss of reputation, and loss of taxpayers\xe2\x80\x99 trust. Dead code also could impact\nperformance of the database initialization, and test code could be executed during initialization.\nThis could result in data not being accurate or complete in the CADE 2 database.\n\nRemediation of identified security weaknesses is ineffective\nThe IRS performed mainframe database security testing on its IBM mainframe systems using the\nIBM Guardium scanner in December 2011 and March 2012. The Guardium scanner reviewed\nall sub-systems on the database management system, including CADE 2. As the scan\nencompassed more than just CADE 2, the weaknesses related specifically to the CADE 2\nsubsystem could not be easily identified. The March 2012 Guardium scan identified\n67 weaknesses, of which 49 were deemed critical and 18 were deemed major. We compared the\ncritical weaknesses identified in the December 2011 and March 2012 scans and concluded the\nweaknesses were mostly repeat findings. Figure 5 summarizes the comparison of the scan\nresults.\n\n                                                                                                   Page 11\n\x0c                      The Customer Account Data Engine 2 Database Was Initialized;\n                        However, Database and Security Risks Remain, and Initial\n                       Timeframes to Provide Data to Three Downstream Systems\n                                           May Not Be Met\n\n\n                            Figure 5: December 2011 Versus March 2012\n                               Guardium Scan Identified Weaknesses\n       Date             Scan Type          Number of Critical          Number of Major         Total Identified\n      of Scan                                Weaknesses                  Weaknesses             Weaknesses\n     December\xc2\xa0\n       2011\n                     Privileged\xc2\xa0Users                47                          2                      49\n                     Configuration                    2                         16                      18\n March\xc2\xa02012\n                     Privileged\xc2\xa0Users                47                          2                      49\n                     Configuration                    2                         16                      18\nSource: Treasury Inspector General for Tax Administration analysis of the Guardium scan results.\n\n Weaknesses identified among privileged user accounts included users with unauthorized access\n to tables, packages, and files. Configuration weaknesses are related to default ports and an\n enabled demo table. The IRM states that default sample databases, along with any associated\n objects and user accounts are to be removed.6 These default databases and tables utilize default\n accounts, passwords, and ports. In addition, default ports with known vulnerabilities should not\n be utilized.7 Figure 6 provides examples of repeat weaknesses that were identified by the\n Guardium scanner.\n             Figure 6: Examples of Repeat Weaknesses Identified By Guardium\n\n              Rule Description                                      Number of Exceptions\n                                                      December 2011                       March 2012\n      LOAD privilege has been                                 104                               111\n      granted to unauthorized users.\n      SYSADM privilege has been                                3                                 3\n      granted to unauthorized users.\n      CREATEDBA privilege has                                  13                               13\n      been granted to unauthorized\n      users.\n      One or more sample databases                             6                                 6\n      have been found.\n     Source: Treasury Inspector General for Tax Administration analysis of the Guardium scan results.\n\n\n 6\n     IRM 10.8.21, Information Technology (IT) Security - Database Security Policy.\n 7\n     IRM 10.8.21, Information Technology (IT) Security - Database Security Policy.\n                                                                                                         Page 12\n\x0c                      The Customer Account Data Engine 2 Database Was Initialized;\n                        However, Database and Security Risks Remain, and Initial\n                       Timeframes to Provide Data to Three Downstream Systems\n                                           May Not Be Met\n\n\nPrivileged user accounts are those accounts with elevated privileges which are used to maintain\nand administer systems or to perform tasks. Privileged user accounts include service, database\nadministrator, and system administrator accounts. The Online 5081 system is used to record all\naccess requests and document the semi-annual review of privileged user accounts as required by\nthe IRM.8\nWe reviewed the administration of privileged user accounts by selecting a judgmental sample of\nfive service accounts and five database administrator accounts.9 Supporting evidence for the\nfive service accounts were not documented in the Online 5081 system. Therefore, we could not\nidentify the purpose of these accounts. The Online 5081 system retains only the last review date\nso we were unable to verify that the semi-annual reviews were performed on all 10 privileged\nuser accounts. Further, we were unable to determine if a privileged user access authority is\nappropriate and commensurate with job role and responsibilities as this information was not\navailable in the Online 5081 system. This could result in unauthorized access and loss of\nPersonally Identifiable Information, unauthorized changes to the database, and loss of data\nintegrity.\nThe CADE 2 database was developed and implemented in a short time period and accounts were\nmigrated from existing legacy systems. As a result, default tables and ports were overlooked and\nwere not removed. In addition, when the IRS migrated to the Online 5081 system, validation for\naccuracy and completeness was not conducted and historical records were lost. In addition,\nusing default ports and enabling a demo table increases the IRS\xe2\x80\x99s vulnerability.\n\nRecommendations\nThe Chief Technology Officer should ensure:\nRecommendation 4: JAVA code weaknesses are remediated to enhance security and\nefficiency of the JAVA code.\n           Management\xe2\x80\x99s Response: The IRS agreed with this recommendation. Although the\n           CADE 2 governance has assessed the actual code weakness as low and a risk-based\n           decision was made to accept it, there are processes now in place where the developers\n           provide the code to the Cybersecurity organization for review prior to code promotion.\n           The Cybersecurity organization provides weakness feedback and the cycle is repeated\n           until all code weaknesses have been addressed. Additionally, the developers now use an\n           automated code review tool as part of their own development process. Finally, a decision\n           (i.e., fix the code, remove it permanently, or accept the risk) will be made as to the final\n           disposition of this code prior to the CADE 2 Milestone 5 exit.\n\n\n8\n    IRM 10.8.1, Information Technology (IT) Security - Policy and Guidance.\n9\n    A judgmental sample is a nonstatistical sample, the results of which cannot be used to project to the population.\n                                                                                                               Page 13\n\x0c               The Customer Account Data Engine 2 Database Was Initialized;\n                 However, Database and Security Risks Remain, and Initial\n                Timeframes to Provide Data to Three Downstream Systems\n                                    May Not Be Met\n\n\nRecommendation 5: Privileged user accounts are properly documented, administered,\nmonitored, and reviewed in accordance with the IRM or removed from the system.\n      Management\xe2\x80\x99s Response: The IRS agreed with this recommendation. The IRS\n      stated that it should be noted that the Treasury Inspector General for Tax Administration\n      did not take into account, as part of this audit, any risk-based decisions around privileged\n      user accounts or the fact that the IBM Guardium scanner\xe2\x80\x99s predefined \xe2\x80\x9ccritical\xe2\x80\x9d\n      weaknesses levels are not necessarily correct for the CADE 2 environment.\n      Notwithstanding, enterprise-level remediation plans are being developed to address\n      validated scan findings.\n      Office of Audit Comment: Our analysis was based on the aggregation of the\n      weaknesses identified in each rule by the Guardium scan dated December 2011 and\n      March 2012 and not the default rating by the IBM Guardium scan.\nRecommendation 6: Sample tables and default ports are disabled or removed prior to the\nCADE 2 Program exiting TS1.\n      Management\xe2\x80\x99s Response: The IRS disagreed with this recommendation. The IRS\n      stated that IRM 10.8.21.5.4.2 does not explicitly list the use of default ports as forbidden.\n      Changing the default DB2 port is a massive technology undertaking and does not add\n      significantly to the level of security; therefore, doing so should not be taken lightly.\n      Additionally, the default DB2 port impacts all risk-based applications on the Master File\n      platform, not just CADE 2, and changes could jeopardize access to vital tax\n      administration applications. Nonetheless, as stated in Corrective Action 5, changing the\n      default port will be taken into consideration as part of an enterprise risk mitigation\n      remediation plan.\n      Office of Audit Comment: Ports with known vulnerabilities should not be used when\n      possible. If these ports are to be used in production, the port setting should be set to\n      \xe2\x80\x9cdisable broadcast.\xe2\x80\x9d In addition, default tables were identified in the December 2011\n      Guardium scan. The same default tables were identified in the March 2012 Guardium\n      scan. The IRM states default tables should be disabled or deleted.\nRecommendation 7: The Online 5081 system should be enhanced to retain and display the\nlast two review dates.\n      Management\xe2\x80\x99s Response: The IRS partially agreed with this recommendation. The\n      IRS is reviewing the possibility of loss of historical records during the migration to the\n      Online 5081 system, as reflected in the audit report. If, upon completion of that review,\n      the IRS finds significant risks to the CADE 2 database, the Chief Technology Officer will\n      work with the Director, Agency-wide Shared Services, and the owner of the Online 5081\n      system to consider ways to mitigate the vulnerabilities. If the mitigation strategy\n      suggests enhancements to the Online 5081 system, the IRS will make that decision\n\n                                                                                           Page 14\n\x0c        The Customer Account Data Engine 2 Database Was Initialized;\n          However, Database and Security Risks Remain, and Initial\n         Timeframes to Provide Data to Three Downstream Systems\n                             May Not Be Met\n\n\nweighing the risks to the CADE 2 database against the costs in time and resources to do\nthe system enhancements.\nOffice of Audit Comment: The Online 5081 system is missing historical information\nsuch as account creation date, purpose of the account, and the last two review dates. The\nIRM requires this information to be documented and maintained.\n\n\n\n\n                                                                                  Page 15\n\x0c                      The Customer Account Data Engine 2 Database Was Initialized;\n                        However, Database and Security Risks Remain, and Initial\n                       Timeframes to Provide Data to Three Downstream Systems\n                                           May Not Be Met\n\n\n                                                                                                      Appendix I\n\n            Detailed Objective, Scope, and Methodology\n\nThe overall objective was to review the CADE 2 database implementation and ensure that the\ndatabase was secure, accurate, and complete, and that prior weaknesses identified were corrected\nor mitigated. To accomplish our objective, we:\nI.         Reviewed the architectural configuration for the database environment to identify control\n           points and ensure weaknesses are identified and mitigated.\n           A. Reviewed the architectural diagram for the application environment and used it to\n              identify potential control weaknesses.\n           B. Determined the impact to the database, impacted systems, and taxpayers of any\n              control weaknesses not mitigated.\nII.        Determined if the database is properly secured.\n           A. Reviewed the results of two independent assessments performed by contractors.\n           B. Determined if weaknesses identified by the December 2011 Guardium scan were\n              corrected.\n           C. Determined if the database is secured and privileged user accounts are limited,\n              monitored, and reviewed. There were 383 privileged user accounts and we\n              judgmentally selected1 five service accounts and five database administrator accounts\n              for review.\n           D. Determined if default (demo) tables were properly secured, removed, or disabled.\nIII.       Determined if data integrity controls are developed and operating as designed to ensure\n           data are accurate and complete.\n           A. Interviewed a balancing and control subject matter expert and the ETL subject matter\n              expert for the database and obtained documents detailing the balance and control\n              policy, procedures, and processing.\n           B. Determined if weaknesses identified in the ETL process were corrected or mitigating\n              controls were developed and implemented.\n           C. Reviewed any other code reviews performed on the CADE 2 database, including\n              cycle synchronization and daily updates.\n\n1\n    A judgmental sample is a nonstatistical sample, the results of which cannot be used to project to the population.\n                                                                                                               Page 16\n\x0c               The Customer Account Data Engine 2 Database Was Initialized;\n                 However, Database and Security Risks Remain, and Initial\n                Timeframes to Provide Data to Three Downstream Systems\n                                    May Not Be Met\n\n\n       D. Ensured that data transfers between input data sources and the audited database are\n          complete and accurate.\n       E. Determined if the processes for ensuring database consistency during cycle\n          synchronization and daily update address the accuracy and completeness of data.\nIV.    Reviewed downstream system/application interfaces and impact(s).\n       A. Interviewed the subject matter expert for system interfaces to gain an understanding\n          of how system interfaces and impact are determined.\n       B. Determined if interfaces are secured.\nInternal controls methodology\nInternal controls relate to management\xe2\x80\x99s plans, methods, and procedures used to meet their\nmission, goals, and objectives. Internal controls include the processes and procedures for\nplanning, organizing, directing, and controlling program operations. They include the systems\nfor measuring, reporting, and monitoring program performance. We determined the following\ninternal controls were relevant to our audit objective: the IRM, related CADE 2 documents, and\nguidelines and processes in the development of the CADE 2 database. We evaluated these\ncontrols by conducting interviews and meetings with management and staff, attending CADE 2\nDatabase Implementation meetings, and reviewing CADE 2 Program documentation and\nCADE 2 Database Implementation documents such as the CADE 2 Program Charter, CADE 2\nSolution Architecture, CADE 2 Database Implementation Test Plan, CADE 2 Program\nManagement and Integration Plan, CADE 2 Program Road Map, and CADE 2 Interface Control\nDocument, and other documents that provided evidence of whether IRM systems testing\nprocesses were followed and if those processes were adequate and operating as designed.\n\n\n\n\n                                                                                        Page 17\n\x0c               The Customer Account Data Engine 2 Database Was Initialized;\n                 However, Database and Security Risks Remain, and Initial\n                Timeframes to Provide Data to Three Downstream Systems\n                                    May Not Be Met\n\n\n                                                                              Appendix II\n\n                 Major Contributors to This Report\n\nAlan R. Duncan, Assistant Inspector General for Audit (Security and Information Technology\nServices)\nDanny R. Verneuille, Director\nLarry W. Reimer, Audit Manager\nMark K. Carder, Senior Auditor\nK. Kevin Liu, Lead Information Technology Specialist\nHung Q. Dam, Information Technology Specialist\nArlene Feskanich, Information Technology Specialist\n\n\n\n\n                                                                                     Page 18\n\x0c              The Customer Account Data Engine 2 Database Was Initialized;\n                However, Database and Security Risks Remain, and Initial\n               Timeframes to Provide Data to Three Downstream Systems\n                                   May Not Be Met\n\n\n                                                                        Appendix III\n\n                        Report Distribution List\n\nCommissioner C\nOffice of the Commissioner \xe2\x80\x93 Attn: Chief of Staff C\nDeputy Commissioner for Operations Support OS\nAssociate Chief Information Officer, Applications Development OS:CTO:AD\nAssociate Chief Information Officer, Enterprise Operations OS:CTO:EO\nAssociate Chief Information Officer, Cybersecurity OS:CTO:C\nAssociate Chief Information Officer, Modernization Program Management Office OS:CTO:MP\nDirector, Security Risk Management OS:CTO:C:SRM\nChief Counsel CC\nNational Taxpayer Advocate TA\nDirector, Office of Legislative Affairs CL:LA\nDirector, Office of Program Evaluation and Risk Analysis RAS:O\nOffice of Internal Control OS:CFO:CPIC:IC\nAudit Liaisons:\n       Commissioner, Wage and Investment Division SE:W:S:PRA:PEI\n       Director, Risk Management Division OS:CTO:SP:RM\n\n\n\n\n                                                                               Page 19\n\x0c                      The Customer Account Data Engine 2 Database Was Initialized;\n                        However, Database and Security Risks Remain, and Initial\n                       Timeframes to Provide Data to Three Downstream Systems\n                                           May Not Be Met\n\n\n                                                                                                      Appendix IV\n\n            Management\xe2\x80\x99s Response to the Draft Report                                                            1\n\n\n\n\n1\n    The final audit report title was revised based on discussions with the IRS after the issuance of the draft report.\n                                                                                                                     Page 20\n\x0cThe Customer Account Data Engine 2 Database Was Initialized;\n  However, Database and Security Risks Remain, and Initial\n Timeframes to Provide Data to Three Downstream Systems\n                     May Not Be Met\n\n\n\n\n                                                       Page 21\n\x0cThe Customer Account Data Engine 2 Database Was Initialized;\n  However, Database and Security Risks Remain, and Initial\n Timeframes to Provide Data to Three Downstream Systems\n                     May Not Be Met\n\n\n\n\n                                                       Page 22\n\x0cThe Customer Account Data Engine 2 Database Was Initialized;\n  However, Database and Security Risks Remain, and Initial\n Timeframes to Provide Data to Three Downstream Systems\n                     May Not Be Met\n\n\n\n\n                                                       Page 23\n\x0cThe Customer Account Data Engine 2 Database Was Initialized;\n  However, Database and Security Risks Remain, and Initial\n Timeframes to Provide Data to Three Downstream Systems\n                     May Not Be Met\n\n\n\n\n                                                       Page 24\n\x0cThe Customer Account Data Engine 2 Database Was Initialized;\n  However, Database and Security Risks Remain, and Initial\n Timeframes to Provide Data to Three Downstream Systems\n                     May Not Be Met\n\n\n\n\n                                                       Page 25\n\x0c'